Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply
نویسندگان
چکیده
We consider the problem of building high-performance implementations of sparse matrix-vector multiply (SpM×V), or y = y+A ·x, which is an important and ubiquitous computational kernel. Prior work indicates that cache blocking of SpM×V is extremely important for some matrix and machine combinations, with speedups as high as 3x. In this paper we present a new, more compact data structure for cache blocking for SpM×V and look at the general question of when and why performance improves. Cache blocking appears to be most effective when simultaneously 1) the vector x does not fit in cache 2) the vector y fits in cache 3) the non zeros are distributed throughout the matrix and 4) the non zero density is sufficiently high. In particular we find that cache blocking does not help with band matrices no matter how large x and y are since the matrix structure already lends itself to the optimal access pattern. Prior work on performance modeling assumed that the matrices were small enough so that x and y fit in the cache. However when this is not the case, the optimal block sizes picked by these models may have poor performance motivating us to update these performance models. In contrast, the optimum block sizes predicted by the new performance models generally match the measured optimum block sizes and therefore the models can be used as a basis for a heuristic to pick the block size. We conclude with architectural suggestions that would make processor and memory systems more amenable to SpM×V.
منابع مشابه
Run-Time Reference Clustering for Cache Performance Optimization
We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re...
متن کاملFast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure
We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A1 + A2 + . . . + As, where each term is stored in a new data str...
متن کاملAn Improved Sparse Matrix-Vector Multiply Based on Recursive Sparse Blocks Layout
The Recursive Sparse Blocks (RSB) is a sparse matrix layout designed for coarse grained parallelism and reduced cache misses when operating with matrices, which are larger than a computer’s cache. By laying out the matrix in sparse, non overlapping blocks, we allow for the shared memory parallel execution of transposed SParse Matrix-Vector multiply (SpMV ), with higher efficiency than the tradi...
متن کاملOptimizing Sparse Matrix Vector Multiplication on SMPs
We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing ho...
متن کاملInnuence of Cross-interferences on Blocked Loops: a Case Study with Matrix-vector Multiply
State-of-the art data locality optimizing algorithms are targeted for local memories rather than for cache memories. Recent work on cache interferences seems to indicate that these phenomena can severely aaect blocked algorithms cache performance. Because of cache connicts, it is not possible to know the precise gain brought by blocking. It is even diicult to determine for which problem sizes b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004